Modeling the scaling properties of human mobility 
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While the fat tailed jump size and the waiting time distributions characterizing 
individual human trajectories strongly suggest the relevance of the continuous time 
random walk (CTRW) models of human mobility, no one seriously believes that human 
traces are truly random. Given the importance of human mobility, from epidemic 
modeling to traffic prediction and urban planning, we need quantitative models that 
can account for the statistical characteristics of individual human trajectories. Here we 
use empirical data on human mobility, captured by mobile phone traces, to show that 
the predictions of the CTRW models are in systematic conflict with the empirical results. 
We introduce two principles that govern human trajectories, allowing us to build a 
statistically self-consistent microscopic model for individual human mobility. The model 
not only accounts for the empirically observed scaling laws but also allows us to 
analytically predict most of the pertinent scaling exponents. 

Uncovering the statistical patterns that characterize the trajectories humans follow 
during their daily activity is not only a major intellectual challenge, but also of importance for 
public health 1_5 , city planning 6 ~ 8 , traffic engineering 9 ' 10 and economic forecasting u . For 
example, quantifiable models of human mobility are indispensable for predicting the spread 
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of biological pathogens " or mobile phone viruses . 

In the past few years the availability of mobile phone records, GPS data, and other 
datasets capturing aspects of human mobility have given a new empirically driven 
momentum to the subject. While the available datasets significantly differ in their reach and 
resolution, the results appear to agree on a number of quantitative characteristics of human 
mobility. For example, both dollar bill tracking 13 and mobile phone data 14 indicate that the 
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aggregated jump size (Ar) and waiting time (At) distributions characterizing human 

trajectories are fat-tailed, i.e. P(Ar) ~ |Ar| 1 a and P(At) ~ |A?|~ 1- ^ with 0<oc<2and0</?< 

1, where Ar denotes the distances covered by an individual between consecutive sightings, 
and At is the time spent by an individual at the same location. These findings suggest that 
human trajectories are best described as Levy Flights (LF) or continuous time random walks 
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(CTRW), a much studied modelling framework in the random walk (RW) community ' " . 

The purpose of the present paper is to show, using a series of direct measurements, 
that human trajectories do follow several highly reproducible scaling laws. Yet, many of 
these laws are either not explained by the CTRW model, or they are in direct contradiction 
with the CTRW predictions, indicating the lack of modelling framework capable of capturing 
the basic features of human mobility. To explain the origin of the observed scaling laws, we 
introduce two principles that govern human mobility, serving as the starting point of a 
statistically acceptable microscopic model for individual human motion. We show that the 
model can account for the empirically observed scaling laws and allows us to analytically 
predict the pertinent scaling exponents. 

Scaling Anomalies — We used two datasets to uncover the patterns characterizing 
individual mobility. The first dataset (D\) captures for a one-year period the time-resolved 
trajectories of 3 million anonymized mobile phone users. Each time a user initiated or 
received a phone call the tower that routed the communication was recorded for billing 
purposes. Thus, the user's location was recorded with the resolution that is determined by the 
local tower density. The reception area of a tower varies from as little as a few hundred 
meters in metropolitan area to a few kilometres in rural regions, controlling our uncertainty 
about the user's precise location. However, since here we focus on the asymptotic scaling 
properties of human trajectories, these short distance uncertainties are not expected to affect 
our results (see Supplementary Material Section SI). The second dataset (D 2 ) uses the 
anonymized location record of 1,000 users who signed up for a location based service, thus 
their location was recorded every hour for a two week period. As a first step we calculated 

the displacement at hourly intervals, finding P(Ar) ~ |Ar| with a = 0.55±0.05 and an 

expected cutoff at Ar ~ 100 km, corresponding to the distance people could reasonably cover 
in an hour. We used the £>2 dataset to measure P{At), where the waiting time At is defined as 

the time a user spent at one location. We find that P(At) follows P(At) ~\At\ 1 ^ with /? = 
0.8±0.1 and a cutoff of At = 17 hours, likely capturing the typical awake period of an 
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individual. Taken together, the fat-tailed nature of P(Ar) and P(At) suggest that human follow 
a CTRW during their daily mobility. Next we discuss three empirical observations that 
indicate that human trajectories follow reproducible scaling laws, but also illustrate the 
shortcoming of the CTRW model in capturing the observed scaling properties: 

(A) The number of distinct locations S{t) visited by a randomly moving object is expected to 
follow 21 " 23 

sms, (i) 

where /u = 1 for Levy flights 24 and fx = for CTRW. Interestingly, our measurements indicate 
that for humans pi = 0.6±0.02 (see Fig. la), smaller than the CTRW prediction of/? = 0.8±0.1. 
The fact that \x < 1 indicates a slow-down at large time scales, a deceasing tendency of the 
user to visit previously unvisited locations. 

(B) Visitation frequency: The probability /of a user to visit a given location is expected to be 
asymptotically (/— >oo) uniform everywhere if ~ const.) for both LF and CTRW. In contrast, 
the visitation patterns of humans is rather uneven, so that the frequency / of the k most 
visited location follows Zipf s law 14 

fk~k< (2) 
where C~ 1-2 ± 0.1 (see Fig. lb). This suggests that the visitation frequency distribution 
follows P(f) ~f< l+v ®. 

(C) Ultra-slow diffusion: The CTRW model predicts that the mean square displacement 
(MSD) asymptotically follows (a* 2 (o}~* v with v = 2fi/a « 3.1. Since both P(Ar) and P(At) 
have cutoffs, asymptotically the MSD should converge to a Brownian behaviour with v = 1 . 
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However, this convergence is too slow to be relevant in our observational time frame. 
Either way, CTRW predicts that the longer we follow a human trajectory, the further it will 
drift from its initial position. Yet, humans have a tendency to return home on daily basis, 
suggesting that simple diffusive processes, that are not recurrent in two dimensions, do not 
offer a suitable description of human mobility. Indeed, our measurements indicate an ultra- 
slow diffusive process, in which the MSD appears to follow a slower than logarithmic growth 
(see Fig. lc and Ref. 14). Such ultra-slow growth of the MSD is rare in diffusion, having 
been observed before only in a few disordered systems, from glasses (for example the Sinai 

76 77 78 

model ) to polymers and iterated maps . 

On one end, the findings summarized in A - C indicate that individual human mobility 
does follow reproducible scaling laws, whose origins remain to be uncovered. Yet, they also 
document systematic deviations from the predictions of the LF or CTRW based null models. 
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The main purpose of this paper is to offer a model that not only explains the origin of the 
anomalies A - C, but also leads to a self-consistent statistical model of individual human 
mobility. 

Generic Mechanisms and Individual Mobility Model — As we build our model, 
we will take for granted the observations that the jump size P(Ar) and the waiting time P(Af) 
distributions characterizing individual human trajectories are heavy tailed, a phenomenon 
addressed by a series of models " . Yet, P(Ar) and P(At) alone are not sufficient to explain 
the scaling laws A - C. We propose that the main reason for this discrepancy is that two 
generic mechanisms, exploration and preferential return, both unique to human mobility, are 
missing from the traditional random walk (LF or CTRW) models: 

(1) Exploration: Random walk models assume that the next diffusive step is 
independent of the previously visited locations. In contrast, the scaling law (1) indicates that 
the tendency to explore additional locations decreases with time. Indeed, the longer we 
observe a person's trajectory, the harder is to find locations in the vicinity of their 
home/workplace that they have not yet visited. 

(2) Preferential Return: In contrast with the RW based models for which the visitation 
probability is random and uniform in space, humans show significant propensity to return to 
the locations they visited frequently before, like home or workplace. 

In what follows we present an individual mobility (IM) model that incorporates 
ingredients (1) and (2), showing that they are sufficient to explain the anomalies A - C. The 
model, intended to describe the trajectory of an individual, assumes that at time t = the 
individual is at some preferred location (see Fig. 2). After a waiting time At chosen from the 
P(Af) distribution, the individual will change its location. We assume that the individual has 
two choices: 

(i) Exploration: With probability 

P ne , = pS- y (3) 
the individual moves to a new location (different from the S locations it visited before). The 
distance Ar that it covers during this exploratory jump is chosen from the P(Ar) distribution 
and its direction is selected to be random. As the individual moves to this new position, the 
number of previously visited locations increases from S to S+l . 

(ii) Preferential Return: With the complementary probability P vet = 1 - pS~ y the 
individual returns to one of the S previously visited locations. In this case, the probability n, 
to visit location i is chosen to be proportional to the number of visits the user previously had 
to that location. That is, we assume that 
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n ; =fi, 



(4) 



an assumption known as preferential attachment or cumulative advantage in network and 
social science " . 

Model Predictions — The IM model has two parameters, < p < 1 and y > 0, both of 
which control the user's tendency to explore a new location during its next move vs. 
returning to a previously visited location. The numerical values of these two parameters will 
be determined later from the empirical data. 

Anomaly A: To explain the origin of the anomaly A, we note that in the IM model the 

probability that an individual moves to a new location is proportional to S~ y , i.e. dSI dn<x,S~ r 
, predicting S ~ n l{l+y \ where n is the total number of discrete moves the individual had up to 

time t. For a fat-tailed waiting time distribution P{&) ~\kt\ 1 ^ the time t scales with the 

number of jumps n as t ~ n 1/p (Section S3A in Supplementary Material), obtaining that S(t) 
follows (1) with exponent 



To verify the validity of this prediction for the IM model, in Fig. Id we calculated S(t) 
numerically for different values of /?, finding that the asymptotic scaling exponent of S(t) 
follows Eq. (5). Therefore, we predict that p < /?, in line with the empirical data. 

Anomaly B: To account for anomaly B we notice that m t , the number of visits to 
location i, increases as dmjdn = 11,(1 -P new ), where IT = ft = mJ'Limiin) is the probability to 
return to the location i during step n. When y > 0, in the limit of S(t)^<x) the probability to 
explore a new location is negligible compared to the return visits, thus asymptotically we 
have dmildn = mil'Lwiin). Since S; m ; (n) = n, we obtain mi{n)= nlm, where n t denotes the jump 
during which location i was first visited, at which moment m,(«0 = 1 . Thanks to preferential 
return (4), the earlier a location is visited, the more it is visited later on. Thus the ranking k t 
for location i coincides with the order in which it was first visited, i.e. k t = S{nl) ~ n, Xi{ - l+y \ 
Since the visitation frequency^? is proportional to min)= nln t , we have fk ~ k'^ with exponent 
C= 1 + y. In general, we find (Section S3C in Supplementary Material) 



Note that for y = and p = 1 Eq. (6) predicts C = 0, indicating, as expected, that the visitation 
becomes homogeneous in the CTRW limit. 

To test the validity of prediction (6), in Fig. le we measured fk for different values of y 
and p in the IM model, finding that the numerically observed scaling behaviour is in 



[i=/3/(l+y). 



(5) 




(6) 
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agreement with Eq. (6). The model also suggests that the observed Zipf s law is rooted 
mainly in the preferential return (4). Indeed, if we calculate^ for a "democratic" model in 
which IX is independent of fi (i.e. the individual visits the previously visited locations with 
the same likelihood), Zipf s law vanishes (see Fig. le). 

Anomaly C: To understand the origin of anomaly C, we note that the number of 
jumps to new locations / relates to the displacement Ar as Ar ~ l 1/a (Section S3A in 

Supplementary Material), suggesting that (Ar 2 ^ ~ (l 2la ^ = l 2la P(l | 5), where P(l\S) is 

the probability that the 5 th location is / steps away from the starting point of the individual. 
Note that / is different from n, as / counts only moves that result in a jump to some new 

i=x PV ~ 1 1 k)fk , where 

each term within the sum represents a jump from the k th location to 5 th location. Hereof is the 
probability of visitation of the k th location given that the total number of locations visited 
previously is 5-1, and is well approximated by Zipfs law (2) f£ ~ (£-\)/(\-S 1 ^) k'^. By 
applying this approximation (see Section S3D in Supplementary Material), we find 
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which relates the MSD to the number of distinct location 5 visited by the user. In Figs, lc 
&lf, we plot the MSD vs. 5 for different values of £ finding that Eq. (7) agrees well with the 
numerical results. Furthermore, in the inset of Fig lc we plot the prediction (7) against the 
empirical data, again with excellent agreement. 

To determine the explicit time dependence of MSD, we use Eqs. (1) and (5) to predict 
three possible scaling regimes for large t (or 5): 

1 (T 

(a) For £ < 1 (y = and p < 1), 5 s diverges and thus the MSD grows with time as (log 
t) 2la . In this regime the visitation frequency P(f) ~ has exponent greater than two and 
thus the diffusion is dominated by the infrequently visited locations. 

(b) For £= 1, we have (C-l)/(l-S l< ) = log(5), and thus the MSD ~ log(log(0) 2/a . 

1 c 

(c) For £ > 1,5 approaches zero for large 5 and thus the MSD is expected to 
saturate. This is because P(f) decays with an exponent whose value is less than two and thus 
the individual's motion is dominated by his/her most visited location. 

To compare further the IM model with the empirical data, we first need to test the 
validity of the hypotheses (i) - (ii) as formulated by Eqs. (3) and (4). To do this, we used the 
Di dataset and measured the rate at which the users visit new locations (Fig. 3a). We find that 
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^new x S 7 with y = 0.21±0.02, a result that not only confirms the validity of hypothesis (3), 
but also provides the numerical value of the exponent y. To test the validity of the hypothesis 
(ii), in Fig. 3b we plot the probability II that a user returns to a previous location in function 
of the previous visitation frequency / of this location. The plot indicates that II = f, 
confirming the validity of preferential return (4). Furthermore, the ratio between P new and S 7 
measures the parameter p for each user, allowing us to plot P(p) for our user group, finding 
that P(p) approximately follows a normal distribution with the mean at < p > ~ 0.6. Note that 
the scaling properties of the IM model do not depend on p for y > 0, thus while the precise 
value of p is important to parameterize the model, it does not affect the scaling laws discussed 
in A - C. 

Finally, to match our model based prediction with the empirical data, it is necessary to 
inspect the relationship between the exponents characterizing the model and those observed 
for real human mobility. In this respect our starting point are the three independently 
determined exponents a = 0.55±0.05, /? = 0.8±0.1 and y = 0.21±0.02. Equation (5) predicts p 
= 0.67±0.07, in agreement within the error bar with the empirical value p = 0.6±0.02 (Fig la). 
Furthermore, Eq. (6) predicts C= 1.21±0.02, again in excellent agreement with the empirical 
value C = 1.2±0.1. Moreover, Eq. (5) and Eq. (6) predict the existence of the scaling 
relationship /? = pC between three empirical exponents (valid for <f > 1), which is again 
consistent with empirical data (Fig. lb). Finally, when it comes to anomaly C the analytical 
predictions offer three different scaling regimes, determined by the value of C The empirical 
data indicates the Zipf exponent C =1.2±0.1, for which we predict a saturation in the MSD. 
To understand how the system reaches saturation, we expand C around C = 1 > finding that in 
the transient regime the MSD scales as (loglog(^)) 2 a . This prediction is valid only if S(t) < 
exp(l/(^-l)) ~ 148, which is true for 89% of the users that have S < 148 over the one year 
period. This prediction is consistent with the empirical data shown in Fig. lc, documenting a 
slower than logarithmic growth of the MSD. Note that in the empirical data we do not 
observe the predicted saturation of the MSD, potentially due to the finite time frame (1 year) 
used in the study. To estimate the saturation time, we extrapolated Eq. (1) up to 5(4) =148 
for the empirical data, predicting that the saturation can be reached only after t s = 5 years, 
beyond our data horizon. Note that the MSD saturation could also be rooted in the finite size 
of the country (i.e. the finite number of towers a user can visit). Yet, scaling arguments (see 
Supplementary Material S6) indicates that the saturation time for S due to the finite size 
effects is approximately 10,000 years. Therefore, the saturation of MSD is rooted in the need 
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to return to the most visited locations, forcing the new locations explored by a user to be 
close to user's most visited location. 

Another key empirical observation is population heterogeneity: the radius of gyration 
r g of the trajectory of different individuals is found to follow a fat tailed distribution (see Fig. 
4a and Ref. 14). As we show in Fig. 4c, we find that our model can reproduce this feature as 
well, indicating that the fat tailed P{r g ) is a consequence of the inherent fluctuations present 
within the model and it is rooted in the P(Ar) distribution. Indeed, in Supplementary Section 
S4 we show that the tail of P(r g ) and P(Ar) are expected to share same exponent 1+oc. 
Furthermore, we find that the model reproduces not only the P{r g ) distribution, but the ultra- 
slow growth of r g as well (Fig. 4d), in agreement with the empirical data (Fig. 4b). 

It is important to note that in contrast with the traditional RW, LF or CTRW models, 
our model is dynamically quenched. That is, after an individual explores a new location, 
he/she will have an increasing tendency to return to it in the future, generating a recurrent and 
relatively stable mobility pattern for each individual. In principle one could also consider a 
model that assigns a quenched visitation variable to each site. Our approach not only avoids 
the need to parameterize such a model, but also achieves the trajectory selection dynamically, 
through its self-quenching character. 

We also note that the model is designed to capture the long term spatial and temporal 
scaling patterns, thus in its present from it does not reproduce the short time frame temporal 
order and correlations potentially present in individual mobility. Our choice to focus the 
asymptotic properties is driven not only by theoretical arguments (we aim to reproduce the 
universal and not the transient patterns), but also by practical considerations: many human 
mobility driven processes, from epidemic spreading to city planning, are driven by the 
asymptotic characteristics of human mobility. To achieve a better short range temporal 
fidelity, we need to incorporate the periodic modulations that are known to characterize 
human mobility (there is a 24-hour and 7 days periodicity in human mobility; individuals are 
less likely to change locations during night and quite mobile in the morning and late 
afternoon, see Supplementary Material S7) as well potential correlations in spatial mobility 
(i.e. if location B is between locations A and C in space, the likely order of visitation will be 
A— »B— >C or C— >B— >A). These correlations further constrain the human trajectories, being 
partly responsible for the high degree of the predictability characterizing individual mobility 
patterns . Finally, we note that the dynamical quenching and the recurrent behaviour is 
unique to human trajectories, and does not restrict banknote diffusion, or foraging behaviour 
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' ' . As such, our model represents an improvement over the CTRW/LF models, as it is 
adapted to capture the specifics of human mobility. By reproducing the basic scaling laws 
characterizing human trajectories, the current model offers a conceptual framework that has 
the flexibility to absorb future extensions, potentially improving the temporal fidelity of its 
short term dynamics as well. 

Methods — Mean Square Displacement (MSD): Due to the significant population 
heterogeneity of P(r g ), MSD averaged over all users diverges and thus is ill-defined. We 
therefore grouped users based on their r g , guaranteeing that the MSDs defined within groups 
do not diverge. We calibrate t = to correspond to the moment when an individual leaves 
his/her most visited location. In order to compare with the analytical prediction (7), the 
displacement is counted only when the individual explores a new site. This methodology does 
not affect the scaling behaviour of regular random walk based models. 

Number of Distinct Locations (S): Similar to the MSD, S(t) is the average over different r g 
groups. Given the large population in our dataset, the standard error of mean is quite small 
and the error bar in Fig. la is covered by the symbols. The error bar in ju mainly comes from 
fitting of the power law, whose correlation coefficient r = 0.998 and the /7-value associated 
with the fit is less than 10" 6 . 
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Fig. 1 : Empirical results vs. the predictions of the individual mobility (IM) model, (a) 

The number of visited distinct locations S(t) vs. time for different r g groups, indicating that 
S{t) grows as f, with p ~ 0.6+0.02 (straight line). The dashed line represents the prediction of 
CTRW S(t) ~ r with /? ~ 0.8. (b) Zipf s plot showing the visitation frequency fk of the A>th 
most visited location of a user for different S values. The empirical data is well approximated 

by ft ~ k , where q ~ 1.2+0.1. (c) Time evolution of the MSD ((Ax j ) in log-log scale for 

user groups with different radius of gyration r g , where the MSDs are normalized by their 
value at t = 1 year. The orange curve represents the analytical prediction for the asymptotic 

behavior. The grey line represents the analytical prediction of CTRW, (Ax ) ~ r with ft = 

0.8. Inset: the normalized MSD vs. S for different r g groups, where the black curve represents 
the analytical prediction, (d) The number of visited distinct locations S(t) vs. time in a log-log 
plot, as predicted by the IM model with different /? values (a = 0.5, y = 0.2 and p = 0.1). The 
straight lines represent the analytical prediction. Inset: the EVI model prediction for different p 
values (a = 0.5, /? = 0.6 and y = 0.2. (e) Zipf s plot showing the visitation frequency fk in the 
IM model with different y and p values (a = 0.5, /? = 0.6). The straight lines show the 
analytical prediction. The grey symbols correspond to a democratic model in which 
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preferential attachment is absent, (f) the MSD ((Ax j ) vs. the number of visited distinct 

locations S(t) in a log-linear scale. The symbols correspond to simulations with different £ 
values, where £ = corresponds to the democratic model. The continuous lines represent the 
analytical prediction. 



Time: t+At 




S= 4 



Fig. 2: Schematic description of the Individual Mobility (IM) model. Starting at time t 
from the configuration shown in the left panel, indicating that the user visited previously S = 
4 locations with frequency^ that is proportional to the size of circles drawn at each location, 
at time t + At (with A£ drawn from the P(Af) fat tailed distribution) the user can either (i) 
Exploration (upper panel): visit a new location at distance Ar from its current location, where 
Ar is chosen from P(Ar) fat tailed distribution, or (ii) Preferential return (lower panel): return 
to a previously visited location with probability P ret = 1 - pS~ 7 , where the next location will be 
chosen with probability IT =/,. 
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Fig 3: Testing the hypotheses behind the IM model, (a) The changes in the number of 
distinct locations AS/(AS) visited by the users vs. the number of distinct locations S visited 
previously, indicating that P new ~ AS ~ S~ y with y = 0.21±0.02. The error bars correspond to 
standard error of the mean (SEM). (b) The probability to return to a previously visited 
location n in function of the previous frequency of visitation f, offering evidence for 
preferential attachment H=f. The error bars associate with SEM. (c) The probability density 
function (PDF) P(p) for the user group, finding that P(p), where p is the model parameter (see 
Fig. 2), approximately follows a normal distribution with mean p ~ 0.6. 
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Fig. 4: (a) The P(r g ) distribution of the radius of gyration r g for the mobile phone users at 
different moments of time. The straight line, shown as a guide to the eye, represents a power 
law decay with exponent 1+a ~ 1.55. (b) The time evolution of radius of gyration r g (f) for the 
mobile phone users, where r g (f) is normalized by its maximal value r g (T) at time T '= 1 year. 
The orange curve is a guide to the eye, following (log f) . (c) The distribution of radius of 
gyration P{r g ) for the IM model using a = 0.75, /? = 0.6, y = 0.2 and p = 0.1, the values found 
to be of direct relevance to human mobility, (d) The time evolution of radius of gyration r g (f) 
for the EVI model with the same parameters as (c), where r g (t) is normalized by its value r g (T) 

6 2 

at T= 10 . The green curve is a guide to the eye, following (log t) . 
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